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Abstract. The malfunction of safety-critical systems may cause dam¬ 
age to people and the environment. Software within those systems is 
rigorously designed and verified according to domain specific guidance, 
such as IS026262 for automotive safety. This paper describes academic 
and industrial co-operation in tool development to support one of the 
most stringent of the requirements — achieving full code coverage in 
requirements-driven testing. 

We present a verification workflow supported by a tool that integrates 
the coverage measurement tool RapiCover with the test-vector genera¬ 
tor FShell. The tool assists closing the coverage gap by providing the 
engineer with test vectors that help in debugging coverage-related code 
quality issues and creating new test cases, as well as justifying the pres¬ 
ence of unreachable parts of the code in order to finally achieve full 
effective coverage according to the required criteria. 

To illustrate the practical utility of the tool, we report about an appli¬ 
cation of the tool to a case study from automotive industry. 


1 Introduction 

Software within safety-critical systems must undergo strict design and verifi¬ 
cation procedures prior to their deployment. The recently published IS026262 
standard [I] describes the safety life cycle for electrical, electronic and software 
components in the automotive domain. Different activities are required at dif¬ 
ferent stages of the life cycle, helping ensure that system safety requirements are 
met by the implemented design. The rigor to which these are carried out depends 
on the severity of consequences of failure of the various components. Compo¬ 
nents with automotive safety integrity level (ASIL) D have the most stringent 
requirements, and ASIL A the least strict. One of the key required activities for 
software is to demonstrate the extent to which testing has exercised source code, 
also known as code coverage. This can be a challenging and expensive task |3], 
with much manual input required to achieve adequate coverage results. 

This paper presents work undertaken within the Verification and Testing to 
Support Functional Safety Standards (VeTeSS) project, which is developing new 

* The research leading to these results has received funding from the ARTEMIS Joint 
Undertaking under grant agreement number 295311 “VeTeSS” 




Type 

Description 

ASIL 

Function 
(arch level) 

Each function in the code is exercised at least 

once 

A, B (R); C, D (HR) 

Statement 

Each statement in the code is exercised at least 

once 

A, B (HR); C, D (R) 

Branch 

Each branch in the code has been exercised for 
every outcome at least once. 

A (R); B, C, D (HR) 

MC/DC 

Each possible condition must be shown to inde¬ 
pendently affect a decision’s outcome. 

A, B, C (R); D (HR) 


Table 1. IS026262 Coverage Requirements (HR = Highly Recommended, R = Rec¬ 
ommended) 


tools and processes to meet IS026262. The main contribution of this paper is 
an integration of the FShell tool [5] with an industrial code coverage tool (Rapi- 
Cover) in order to generate extra test cases and increase code coverage results. 
An additional contribution is to present a discussion as to how this technology 
might be most appropriately used within the safety life cycle. Achieving 100% 
code coverage can be a complex and difficult task, so tools to assist the process 
are desirable, however there is a need to ensure that any additional automatically 
generated tests still address system safety requirements. 

Safety standards require different depths of coverage depending on the ASIL 
of the software. The requirements of IS026262 are summarized in Tab. The 
aim of requirements-based software testing is to ensure the different types of 
coverage are achieved to 100% for each of the categories required. In practice 
this can be extremely difficult, e.g. defensive coding can be hard to provide test 
vectors for. Another example is code that may be deactivated in particular modes 
of operation. Sometimes there is not an obvious cause for lack of coverage after 
manual review. In this situation, generating test vectors automatically can be 
beneficial to the user providing faster turnaround and improved coverage results. 

This paper is laid out as follows. In Sec. we provide background to the cov¬ 
erage problem being tackled, and criteria for success. In Sec. we describe the 
specific tool integration. Sec. describes an industrial automotive case study. 
Sec. looks at both previous work and some of the lessons learned from the im¬ 
plementation experience, and suggested improvements. Finally we present con¬ 
clusions and further work. 

The contribution of this paper is by and large of practical nature: the inte¬ 
gration of formal-methods based tools with industrial testing software. In the 
safety-critical domain these two areas are generally separated from one another, 
with formal methodology used only for small and critical sections of software to 
prove correctness and viewed as an expensive procedure. In some cases the meth¬ 
ods are seen in direct odds to one another [5]. The tool is at a prototype stage 
of development, and the authors are working with industrial partners to assess 
future improvements to prepare its commercialization, as described in Sec. 





2 Assisted Coverage Closure 


Testing has to satisfy two objectives: it has to be effective, and it has to be 
cost-effective. Testing is effective if it can distinguish a correct product from one 
that is incorrect. Testing is cost-effective if it can achieve all it needs to do at 
the lowest cost (which usually means the fewest tests, least amount of effort and 
shortest amount of time). 

Safety standards like IS026262 and DO-178B/C demand requirements-driven 
testing to increase confidence in correct behavior of the software implemented. 
Correct behavior means that the software implements the behavior specified in 
the requirements and that it does not implement any unspecified behaviors. As 
a quality metrics they demand the measurement of coverage according to cer¬ 
tain criteria as listed in Tab. for instance. The rationale behind using code 
coverage as a quality metrics for assessing the achieved requirements coverage 
of a test suite is the following: Suppose we have a test suite that presumably 
covers each case in the requirements specification, then, obviously, missing or 
erroneously implemented features may be observed by failing test cases, whereas 
the lack of coverage, e.g. according to the MC/DC criterion, indicates that there 
is behavior in the software which is not exercised by a test case. This may hint 
at the following software and test quality problems: 

(A) Some cases in the requirements specification have been forgotten. These 
requirements have to be covered by additional test cases. 

(B) Features have been implemented that are not needed. Unspecified features 
are not allowed in safety-critical software and have to be removed. 

(C) The requirements specification is too vague or ambiguous to describe a 
feature completely. The specification must be disambiguated and refined. 

(D) Parts of the code are unreachable. The reasons may be: 

(1) A programming error that has to be fixed. 

(2) Code generated from high-level models often contains unreachable code 
if the code generator is unable to eliminate infeasible conditionals. 

(3) It may actually be intended in case of defensive programming and error 
handling. 

In the latter case, fault injection testing is required to exercise these features 
[9]. Dependent on the policy regarding unreachable code, case (2) can be 
handled through justification of non-coverability, tuning the model or the 
code generator, or post-processing of generated code. 

The difficulty for the software developer consists in distinguishing above cases. 
This is an extremely time consuming and, hence, expensive task that calls for 
tool assistance. 

2.1 Coverage Closure Problem 

Given 

— an implementation under test (e.g. C code generated from a Simulink model), 

— an initial test suite (crafted manually or generated by some other test suite 
generation techniques), and 


— a coverage criterion (e.g. MC/DC), 

we aim at increasing effective test coverage by automatically 

— generating test vectors that help the developer debug the software in order 
to distinguish above reasons (A)-(D) for missing coverage; 

— in particular, suggesting additional test vectors that help the developer create 
test cases to complete requirements coverage in case (A); 

— proving infeasibility of non-covered code, thus giving evidence for arguing 
non-coverability. 

Note that safety standards like to DO-178C m allow only requirements- 
driven test-case generation and explicitly forbid to achieve full structural code 
coverage by blindly applying automated test-vector generation. This can easily 
lead to confusion if the distinction between test-case generation and test-vector 
generation is not clearly made. Test-vector generation can be applied blindly 
to achieve full coverage, but it is without use by itself. A test vector is only 
a part of a test case because it lacks the element that provides information 
about the correctness of the software, i.e. the expected test result. Only the 
requirements can tell the test engineer what the expected test result has to be. 
Test-case generation is thus always based on the requirements (or a formalized 
model thereof if available). Our objective is to provide assistance for test-case 
generation to bridge the coverage gap. 

2.2 Coverage Measurement 

Combining a test-case generator with a coverage tool provides immediate access 
to test vectors needed to obtain the level of coverage required for your qualifica¬ 
tion level. 

Coverage tools determine which parts of the code have been executed by using 
instrumentation. Instrumentation points are automatically inserted at specific 
points in the code. If an instrumentation point is executed, this is recorded in its 
execution data. After test completion, the coverage tool analyzes the execution 
data to determine which parts of the source code have been executed. The tool 
then computes the level of coverage achieved by the tests. 

We use the coverage tool RapiCover, which is part of the RVS tool suite 
developed by Rapita Systems Ltd. 


2.3 Test Vector Generation by Bounded Model Checking 


We use the test vector generator, FShell [5] (see Sec. 3.2 for details), which is 
based on the Software Bounded Model Checker for C programs, CBMC [3]. 

Viewing a program as a transition system with initial states described by the 
propositional formula Init, and the transition relation Trans, Bounded Model 
Checking (BMC) [2] can be used to check the existence of a path tt of length 
k from Init to another set of states described by the formula if. This check is 
performed by deciding satisfiability of the following formula using a SAT or SMT 
solver: 

Init{so) A /y Trans{sj,ij,Sj+i) Aip{sk) (1) 

0<j<k 



(initial test suit^ - ^ Test suite j * - / New test cases New test vectors^/ 



Fig. 1. The Coverage Closure Process 


If the solver returns the answer “satisfiable”, it also provides a satisfying assign¬ 
ment to the variables (sq, to, si, *i, • ■ •, ifc-i, Sfc). The satisfying assignment 
represents one possible path tt = (sg, si, • ■ ■, s^) from Init to ip and identifies the 
corresponding input sequence (fg,..., ik-i)- 

Besides being useful for refuting safety properties (where ip defines the error 
states), BMC can be used for generating test vectors (where ip defines the test 
goal to be covered). 

The analysis performed by CBMC is bit-exact w.r.t. the machine semantics 
of the execution target and CBMC provides full bit-exact support for floating 
point arithmetic. Architecture-specific settings can be configured via command 
line in FShell and RapiCover supports on-target coverage measurement. We are 
hence guaranteed that the generated test vectors are going to cover the test 
goals. In addition, using BMC in a test-vector generator permits generating the 
shortest test vectors possible to cover a certain test goal or even a whole group 
of test goals, which helps keeping test suites concise and test execution fast El- 
An advantage of using a model checker is also its ability to find test vectors 
for corner cases (“Under which conditions can this floating point variable take 
the value NaN?”). Moreover, in our experience, due to the high precision of 
the analysis, it is even very likely to discover inconsistencies and holes in the 
requirements specification during test-vector generation. 

At last, BMC can give a proof of unreachability of a test goal if loops can be 
unrolled completely; or otherwise, fc-induction [15], a BMC-based technique for 
unbounded model checking, can be used to attempt a proof. 


2.4 The Coverage Closure Process 

The algorithm that we implement to assist the coverage closure process is shown 
in Fig. 12 It proceeds as follows: 

1. We start with an initial test suite that has been crafted manually or has been 
generated using other test-case generation techniques like directed random 
testing. The initial test suite may be empty, but many test goals can be easily 
covered using test-case generation methods that are cheaper than Bounded 
Model Checking. It is thus recommended to start with such a base test suite. 




























2. In the next step, this test suite is run using the coverage measurement tool 
in order to obtain a list of non-covered test goals. Coverage measurement can 
be performed on a developer machine to obtain approximate coverage, but 
final certification data has to be obtained by running the test suite on the 
actual target platform. 

3. The test-vector generator takes the list of non-covered test goals and tries 
to compute input values to cover them. Ideally, the test-vector generator 
is parametrized with the architectural parameters of the target platform in 
order to obtain guarantees that the goals are indeed going to be covered. As 
our test-vector generator is a Bounded Model Checker, there will be three 
possible outcomes of an attempt to cover test goals: 

(a) A test goal has been covered. In this case this new test vector is presented 
to the user who has to turn it into a new test case to be added to the test 
suite. Note that building the new test case is the only part of the process 
(bold edge) that is not fully automatic since human judgment is required 
to identify why the corresponding test goal has not been covered in the 
hrst place, i.e. distinguishing reasons (A)-(D) in Sec. 

(b) It is infeasible to cover a test goal. This happens when the test-vector 
generator comes up with a proof of unreachability of the test goal. As 
mentioned above, a Bounded Model Checker can provide such proofs 
if the loops have been unwound completely, for instance. In this case, 
the corresponding test goal can be annotated in the coverage report as 
proven infeasible to justify its non-coverability. This increases effective 
coverage by reducing the number of genuinely coverable test goals. 

(c) The goal has not been covered and we were unable to prove infeasibility 
of the test goal. With a Bounded Model Checker this can happen if the 
chosen bound k has been too low. In this case the test goal will remain 
uncovered and it can be tried to cover it with a higher value for k in the 
next iteration of the process. 

4. Coverage of the enhanced test suite is then measured again to identify test 
goals that remain uncovered, and the process is repeated. Generated tests 
typically will cover more test goals than intended. Measuring coverage be¬ 
tween generating tests increases cost-effectiveness of the process by eliminat¬ 
ing unnecessary test-case generations. 

5. If there are no more non-covered test goals we have achieved full coverage 
and the process terminates. 

Note that the process depicted in Fig. [^is not specific to our tool but applies 
in general. In particular, it does not rely on the test-vector generator to guarantee 
that a generated test vector covers the test goal it has been generated for, because 
the coverage measurement tool will check all generated test cases anyway for 
increasing the coverage. However, the generation of useless test cases can be 
avoided by using a tool such as FShell that can provide such guarantees. 

Then, in theory, termination of the process achieving full coverage can be 
guaranteed, because embedded software is finite state. In practice, however, this 


depends on the reachability diameter of the system [12] and the capacity of the 
test-vector generator to cope with the system’s size and complexity. 


3 FShell plugin for RVS Implementation 


The input to the too|^ is a C program with an initial test suite. The output 
of the tool is twofold. The first output is a set of generated test vectors that 
augment the initial test suite to increase its coverage. The second output is a 
coverage report detailing the level of coverage achieved by the initial test suite, 
and the extra coverage a( 

FShell has been in¬ 
tegrated into RapiCover 
as context menu option, 
shown in Fig. Rapi¬ 
Cover can be used to 
select a single function, 
call, statement, decision 
or branch. The tool then 
uses FShell to generate 
a test vector for this el¬ 
ement. Alternatively, the 
tool has a button to gen¬ 
erate as much coverage 
as possible. When this 
option is chosen, the tool 
goes around the loop de¬ 
scribed in Fig. [l] using 
FShell to repeatedly generate test cases to increase the coverage as much as 
possible, verifying the obtained coverage with RapiCover. 

There is tension between the need to demonstrate that the activities pre¬ 
scribed by IS026262 have been met in spirit as well as with quantifiable criteria. 
Recall that achieving 100% code coverage during testing does not ensure the 
code meets its intent. Consequently the FShell plug-in would be provided as 
advisory service, generating candidate test vectors, which a user can examine to 
help them identify why their planned testing was inadequate. Values generated 
need to be assessed for being valid for the system under test, i.e. reflect real 
world values that could be input to a function, e.g. from a sensor. 


Ided by the generated test cases. 



Fig. 2. RVS Process 


RVS is licensed software. An evaluation version can be requested from http: 
//www.rapitasystems.com. The licensing policy disallows anonymous licenses. To 
compensate for this, we provide a video showing the plug-in here: https://drive. 
google.com/file/d/0B7xeLJ8vk3W8Y094TVc4Rmh0S0k, 

































Fig. 3. Screenshot of RapiCover with the FShell Plug-in 


3.1 Introduction to RapiCover 

RapiCovei]^ uses instrumentation to determine which program parts have been 
executed. Instrumentation points are automatically inserted at specific points 
in the code. Execution of an instrumentation point is recorded in its execution 
data. Upon test completion, RapiCover analyzes the execution data to determine 
which instrumentation points have been hit. 

The first step in the RapiCover analysis process is to create an instrumented 
build of the application ((1) in Fig. [^. RapiCover automatically adds instru¬ 
mentation points ((2) in Fig. to the source code. 

The instrumentation code itself takes the form of very lightweight measure¬ 
ment code that is written for each target to ensure minimal impact on the 
performance of the software, and to support on target testing for environments 
with limited resources. The instrumented software and possibly an instrumen¬ 
tation library are compiled and linked using the standard compiler tool chain. 
The executable produced is then downloaded onto the target hardware. The ex¬ 
ecutable is exercised and instrumentation data ((3) in Fig. is generated and 
retrieved. This data is used to generate coverage metrics. 


3.2 Introduction to FShell 

FShelj^ is an extended testing environment for C programs supporting a rich 
scripting language interface. FShell’s interface is designed as a database engine, 
dispatching queries about the program to various program analysis tools. These 
queries are expressed in the FShell Query Language (FQL). Users formulate test 

® http://www.rapitasystems.com/products/rapicover 
® Available from; http://forsyte.at/software/fshell 


































Expression Name 

Syntax 

Example 

Function Call 

@CALL(.. 

.) @CALL(X) 

Concatenation 


@CALL(X).@CALL(Y) 

Sequence 

-> 

@CALL(X)->@CALL(Y) 

Negation 

“NOT(... 

)” “NOT(@CALL(X))” 

Repetition 

* 

@CALL(X)* 

Alternative 

-f 

(@CALL(X) -t @CALL(Y)) 


specifications and coverage criteria, challenging FShell to produce test suites 
and input assignments covering the requested patterns. The program supports a 
rich and extensive interface. The expressions used for the FShell plugin for RVS 
implementation are listed in Tab. with syntax and examples. 

@CALL(X) re¬ 
quires generated test 
cases to call func¬ 
tion X. This is the 
only primitive ex¬ 
pression used in the 
module. The con¬ 
catenation operator 
. joins two expres¬ 
sions, requiring them 
to be satisfied sub¬ 
sequently. As an example, a test case generated by @CALL(X).@CALL(Y) cov¬ 
ers a call to X immediately followed by Y. This is similar to the sequence operator 
->, which requires the second call to occur eventually. @CALL(X)->@CALL(Y) 
is thus fulfilled if a call to X is eventually followed by a call to Y. The negation 
“NOT(@CALL(X))” is satisfied by every statement except a call to function X. 
The repetition operator is implemented along the lines of its regular expression 
pendant, such that @CALL(X)* is satisfied by a series of calls to X. Finally, 
the alternative operator implements logical disjunction, such that (@CALL(X) 
+ @CALL(Y)) will be satisfied if either a call to A or F occurs. 

The expressions and operators above are all that is used by the FShell plug¬ 


Table 2. FShell expressions 


in to generate the test vectors requested by RapiCover. Sec. |3.3| illustrates how 
these expressions are used to convert test goals to equivalent FQL queries. 


3.3 Use of FShell within RapiCover 



The FShell plugin for RVS 
translates test goals requested 
by RapiCover into FQL queries 
covering these goals in FShell, 
as illustrated in Fig. 

Test goals are specified us¬ 
ing marker elements from 
the RapiCover instrumenta¬ 
tion, which can identify arbi- Architecture of FShell plugin for RVS 

trary statements in the source code by assigning them an instrumentation point 
id. In accordance with MC/DC criteria, decisions and their constituting condi¬ 
tions are further identified using unique decision and condition point ids. 

Fig.0 shows an example program before and after RapiCover instrumenta¬ 
tion. The module supports two categories of test goals: Instrumentation Point 
Path Test Goals and Condition Test Goals. The former specifies a simple series 
















int main() { 

// - 

if (a == b II b != c) { 

printf(”%d„%d\n”, a, b); 

} 

return 0; 


int main() { 

// - 
Ipoint (1); 

if(Ipoint(4, Ipoint (2, a == b) || 
Ipoint (3, b != c))) { 

Ipoint (5); 

print! (” %d„%d\n”, a, b); 

} 

Ipoint (6); 
return 0; 

} 


Fig. 5. Code example before and after after RapiCover instrumentation 


of instrumentation points to be covered by FShell. The system also permits in¬ 
clusive or and negation operators in instrumentation point paths, allowing to 
specify a choice of instrumentation points to be covered or to make sure that a 
requested instrumentation point is not covered by the provided test vector. As 
an example, the instrumentation point path l->5->6 in Fig. is only covered 
if the decision in the if statement evaluates to true. Conversely, the path 1- 
>NOT(5)->6 is only covered if it evaluates to false. The former can be achieved 
with inputs a=l,6=l,c=2, whereas the latter could be covered using the in¬ 
put vector a=l,5=2,c=2. Condition Test Goals on the other hand are specified 
by a single decision point and multiple condition points, as well as the desired 
truth value for each decision and condition. This allows us to cover branch con¬ 
ditions with precise values for its sub-conditions. As an example, the condition 
test goal (4,true) -> (2,false) -> (3,true) would be covered by the input vector 
0=1, b=2, c=3. 

The instrumentation elements introduced by RapiCover need to be mapped 
to an equivalent FQL query using the features presented in Tab. For this 


Category 

Goal 

FQL 

Instrumentation 

Simple 

@CALL(Ipoint5) ->@CALL(Ipoint6) ->... 

Point Path Goal 

Disjunction 

( @CALL(Ipoint5) -1- @CALL(Ipoint6) -1- ...) 


Complement 

@CALL(Ipointl).”NOT(@CALL(Ipoint5))*”. 
@CALL(Ipoint6)->... 

Condition Goal 

Condition 

@CALL(Ipoint2f).”NOT(@CALL(Ipointl))*”. 

@CALL(Ipoint2t).”NOT(@CALL(Ipointl))*”. 

+... 


Decision 

@CALL(Ipoint4t) 


Table 3. Test Goal Types and FShell Queries 









purpose, we replace their default implementation in RapiCover by synthesized 
substitutions which are optimized for efficient tracking by FShell. These mock 
implementations are synthesized for each query and injected into the program 
on-the-fiy at analysis time. Standard FQL queries are then enough to examine 
these augmented models for the specified coverage goals. Tab. shows explicitly 
how these goals can described using the FShell query syntax. 


4 Evaluation 

The FShell plugin for RVS has been tested using an industrial automotive use 
case, for a software managed controller. 


4.1 Case Study: e-Shift Park Control Unit 


To illustrate the features and utility of the tool, we applied it to the software of 
an e-Shift Park Control Unit. This systeirj^is in charge of the management of the 
mechanical park lock that blocks or unblocks the transmission to avoid unwanted 
movement of the vehicle when stopped. The park mode is enabled either by 
command of the driver via the gear lever (PRND: park/rear/neutral/drive) or 
automatically. 

Fig. [6] shows the 
architectural elements 
the e-Park system is 
communicating with. 

The vehicle control 
unit monitors the sta¬ 
tus of the vehicle via 
sensors and informs 
the driver, in particu- 



Fig. 6. Case Study: e-Shift Park Control Unit 


lar, about the speed of the vehicle and the status of the gears via the dashboard. 
The e-Park Control Unit is responsible for taking control decisions when to ac¬ 
tuate the mechanical park lock system. 

Among many others, the following requirements have to be fulfilled: 

Parking mode is engaged if vehicle speed is below 6 km/h and the driver 
presses parking button (P) and brake pedal. 

If vehicle speed is above 6 km/h and the driver presses the parking button 
(P) and brake pedal then commands from the accelerator pedal are ignored; 
parking mode is activated as soon as speed decreases below 6 km/h. 

If vehicle speed is below 6 km/h and the driver presses the driving button 
(D) and brake pedal, then forward driving mode is enabled. 

4. If vehicle speed is above 6 km/h then backward driving mode (R) is inhibited. 


1 . 


2 . 


3. 


^ The C code was provided by Centro Ricerche Fiat under a GPL-like 
license and can be downloaded here: https://drive.google.eom/file/d/ 
0B22MA57MHHBKainhQMmpEQlRWVG8, The C code was generated from a Simulink model 
that has not been disclosed, unfortunately. 














As is typical for embedded software, the e-Park Control Unit software con¬ 
sists of tasks that — after initialization of the system on start-up — execute 
periodically in the control loop until system shutdown. A test vector hence con¬ 
sists of a sequence of input values (sensor values and messages received via the 
communication system) that may change in each control loop iteration. We call 
the number of iterations the length of the test vector. 

To generate valid test vectors, a model of the vehicle is required. Otherwise, 
the test vector generator may produce results that are known not to occur in the 
running system, such as infinite vehicle velocity. For the case study this model 
consisted of assumptions about the input value ranges, such as “The speed of the 
car will not exceed 1000 km/h, or reduce below 0 km/h.” These assumptions 
are part of the admissible operating conditions as stated in the requirements 
specification. 


4.2 Experimental Setup 

In order to evaluate the FShell plugin for RVS, we used the C source code of the 
e-Shift case study (approx. 4KLOC) and started out with an initial test suite 
consisting of 100 random test vectors uniformly distributed over the admissible 
input ranges. Then we incrementally extended this test suite by additional test 
vectors generated by the following two approaches: 

1. FShell plugin for RVS following the process illustrated in Fig. 

2. A combination of test vector generation based on random search and greedy 
test suite reduction. 


1 . 

2 . 

3. 

4. 

5. 

6 . 
7. 


9. 


FShell plugin for RVS _ random search + reduction _ 

Start with the initial test suite. 

Compile and run the C source code with the current test suite, using 
RapiCover to generate a coverage report. 

RapiCover provides FShell with a list I 
of non-covered test goals. 

FShell generates a test vector for these Generate a random test vector, uni- 
non-covered test goals. formly distributed over the admissible 

input ranges. 

FShell feeds back information about in¬ 
feasible test goals and test vectors for 
feasible test goals. 

Create C test cases based on these test vectors. 

Re-compile and re-run the C code with this new test case, using RapiCover 
to verify that the generated test case does indeed cover the test goal. 

If the coverage has increased then keep 
the test case; otherwise discard it. 
Repeat from step 3. 


Table 4. Experimental setup of the two approaches that we compare. 






Initial 

test suite 

Random search 

FShell 

plug-in 

Runtime (hh:mm) 

- 

00:33 

01:04 

06:15 

08:00 

08:00 

Generated test cases 

- 

500 

1000 

5000 

6092 

7 

Thereof non-redundant 

- 

7 

10 

13 

13 

7 

Total test cases 

100 

107 

110 

113 

113 

107 

Statement coverage 

51.9% 

52.0% 

52.1% 

52.1% 

52.1% 

52.5% 

Increase 


0.1% 

0.2% 

0.2% 

0.2% 

0.6% 

MC/DC coverage 

Increase 

28.4% 

30.5% 

2.1% 

31.1% 

2.7% 

31.9% 

3.5% 

31.9% 

3.5% 

33.6% 

5.2% 


Table 5. Evaluation results: Comparing FShell plugin for RVS against test vectors 
generated by random search. 


We compared the achieved coverage gain and resulting test suite sizes after 
running both approaches for 8 hours. Tab. [^describes our experimental setup. 

The runtime of FShell is exponential in the loop bound of this main loop. 
Choosing a too high loop bound results in FShell taking prohibitively long to 
run, yet setting the loop bound too low results in some branches not being 
coverable. As mitigation, we started the experiment with a loop bound of 1, then 
we gradually increased the loop bound to cover those branches that we were not 
able cover in previous iterations. As explained in Section 2.1 step 6 in Tab. 


is not automatic since it needs information from the requirements specification. 
For the sake of our comparison that does not care about the pass/fail status of 
the test, we skipped the manual addition of the expected test outcome. 


4.3 Results 

We ran the experiment for 8 hours. The first approach spent more than 99% 
of this time within FShell. Within this time frame, the loop bound reached 2, 
and thus not all branches could be covered. Nevertheless an increased coverage 
was achieved as detailed in Tab. which shows the baseline coverage on the 
initial test suite (second column from the left) and the increase in percentage 
of coverage gained by the tool in this experiment (rightmost column). The code 
under test implements a state machine, which is mostly decisions with very few 
functions and calls, which is why we focussed on decision and statement coverage 
for our evaluation. 

To underpin the benefit of our tool we compared these results to the second 
approach described in Tab. a random search test generation strategy. Tab. 
shows four snapshots of this search (middle columns) after exploring 500, 1000, 
5000 and eventually 6092 test vectors of length 5 out of the admissible input 
rangej^ The results show that more than 99.99% of the generated test vectors 

® We chose length 5 because it seems a good compromise between increasing coverage 
and keeping test execution times short for this case study: adding 100 test vectors 
of length 5 increased coverage by 1.1%; 100 test vectors of length 10 increased it 
















added by the random search are redundant and do not increase the coverage 
of the suite. This confirms that the system under test represents a particularly 
challenging case for black-box test case generation and that only very few test 
vectors in the input range lead to actual coverage increase. 

On the other hand, the FShell plugin for RVS achieves a significantly larger 
increase in the same amount of time in both statement and MC/DC coverage. 
In addition to this, the plug-in achieves this increased coverage with only half 
as many new test vectors as the random approach, leading to an overall smaller 
and more efhcient test suite. 

This evaluation thus underlines the benefit from our tool integration to sup¬ 
port the coverage closure process on an industrial case study. The expected 
reduction in manual work needs to be investigated in a broader industrial eval¬ 
uation involving verification engineers performing the entire coverage closure 
process. 

5 Background Context and Applicability 

5.1 Novelty of the Approach 

There is much work existing for test case generation using Model Checking 
techniques but a smaller amount targeted directly at the high criticality 
safety domain where the criterion and frameworks for test case generation are 
restricted. A useful survey relating to MC/DC can be found in [T7]. In [7] Ghani 
and Clark present a search based approach to generating test frameworks. There 
are two issues with the approach presented, firstly that it is applied to Java—a 
language which is rarely used for safety-critical software, and particularly not for 
the most critical software. The second is more subtle: the test cases were gener¬ 
ated to ensure that the minimal set of truth tables for MC/DC were exercized, 
but without consideration of the validity of any of the test data. Additionally, 
we emphasize that our approach takes into account existing coverage that has 
already been achieved and complements the requirements based testing, rather 
than completely replacing it. 

Other work such as m looks at modification of the original source through 
mutation testing in order to assess effectiveness of the tests. This could be con¬ 
sidered an adjunct to our methodology, but at present mutation testing is not 
widely adopted by industry. Jones [10] considers test prioritization and test suite 
reduction, but not new test case generation. 

5.2 Wider Issues and Lessons Learnt 

In order to encourage wider adoption of this integrated tool, we need to consider 
where it would fit in users’ workflow and verification processes, as well as meeting 
the practical requirements of the standard. As noted earlier, fully automated 


by only 1.3% while test execution times would double and only half as many test 
vectors could be explored. 



code coverage testing is not desirable as it misses the intent of the requirements 
based testing process. However, achieving full code coverage is a difficult task, 
and often requires a large amount of manual inspection of coverage results to 
examine what was missing. Hence providing the user with suggested test data 
is potentially very valuable and could improve productivity in one of the most 
time consuming and expensive parts of the safety certification process. 

Another benefit of integrating test case generation and coverage measurement 
is test suite reduction. The coverage measurement tool returns for each test case 
a list of covered goals. Test suite reduction is hence the computation of a minimal 
set cover (an iVP-complete problem). Approximate algorithms HU may be used 
to achieve this in reasonable runtimes. 

FShell uses a class of semantically exact, but computationally expensive, 
iVP-complete algorithms relying on SAT solvers. Depending on the programs 
or problems posed to the solver the analysis may take long time to complete. 
Initial feedback on the tool showed that the concept was very well received by 
Automotive engineers. Speed was considered an issue, however, keeping in mind 
that today’s practice for full coverage testing may take several person months 
with an estimated cost of $100 per LOCj^ there is great potential for cutting 
down time and cost spent in verihcation by running an automated tool in the 
background for a couple of days. 

Initially, we sometimes failed to validate that a test vector that was generated 
to cover a test goal actually covers that test goal. E.g., one reason were imprecise 
number representations in the test vector output. Using the exact hexadecimal 
representation for floating point constants instead of the imprecise decimal one 
fixed the problem. This highlights the value of bit-exact analysis as well as the 
importance of re-validating coverage using RapiCover in the process (Fig. [^. 

Note also that this process itself is independent of the tools used which offers 
a high degree of flexibility. On the one hand, it is planned that in future RVS will 
support alternative backends in place of FShell. On the other hand, FShell can be 
combined - without changing the picture in Fig.j^- with a mutation testing tool 
(in place of RapiCover) to generate test vectors to improve mutation coverage. 

6 Conclusion 

This paper has demonstrated the successful integration of the FShell tool with 
an industrial code coverage tool. Using the integrated tools we were able to 
increase MC/DC code coverage of an existing, sizeable test suite for an industrial 
automotive case study from 28.4% to 33.6%. When compared to a random black¬ 
box test vector generation strategy, our approach was able to generate a 1.5 times 
higher coverage increase within the same amount of time. Our tool achieves this 
coverage gain with half as many test vectors, and these test vectors are much 
shorter than those generated by random search, leading to more more compact 
test suites and faster test execution cycles. Moreover, the integration of the two 

® Atego. “ARINC 653 & Virtualization Solutions Architectures and Partitioning”, 
Safety-Critical Tools Seminar, April 2012. 



tools simplifies test case generation and coverage measurement work flows into 
a unified process. 

Future work will consider better integration with the debugging environment 
to inspect test vectors, and warning the user about potentially unrealistic envi¬ 
ronment assumptions such as oo for vehicle speed. In addition, better support 
should be provided for exporting the test vectors into the users’ existing test 
suite and testing framework. 
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